Extracting synonymous gene and protein terms from biological literature
نویسندگان
چکیده
MOTIVATION Genes and proteins are often associated with multiple names. More names are added as new functional or structural information is discovered. Because authors can use any one of the known names for a gene or protein, information retrieval and extraction would benefit from identifying the gene and protein terms that are synonyms of the same substance. RESULTS We have explored four complementary approaches for extracting gene and protein synonyms from text, namely the unsupervised, partially supervised, and supervised machine-learning techniques, as well as the manual knowledge-based approach. We report results of a large scale evaluation of these alternatives over an archive of biological journal articles. Our evaluation shows that our extraction techniques could be a valuable supplement to resources such as SWISSPROT, as our systems were able to capture gene and protein synonyms not listed in the SWISSPROT database.
منابع مشابه
Mapping of TP53 protein network using cytoscape software
TP53 acts as a tumor suppressor in cancer. It induces cell cycle arrest or apoptosis in response to cellular stress and damage. p53 gene alteration could cause uncontrolled cell proliferation.In the present study, we used TP53 gene as the seed in the construction of a protein-protein functional association network to identify genes that might involve in tumorgenesis process with TP53. TP53 prot...
متن کاملIdentification and prioritization genes related to Hypercholesterolemia QTLs using gene ontology and protein interaction networks
Gene identification represents the first step to a better understanding of the physiological role of the underlying protein and disease pathways, which in turn serves as a starting point for developing therapeutic interventions. Familial hypercholesterolemia is a hereditary metabolic disorder characterized by high low-density lipoprotein cholesterol levels. Hypercholesterolemia is a quantitativ...
متن کاملComprehensive Computational Analysis of Protein Phenotype Changes Due to Plausible Deleterious Variants of Human SPTLC1 Gene
Genetic variations found in the coding and non-coding regions of a gene are known to influence the structure as well as the function of proteins. Serine palmitoyltransferase long chain subunit 1 a member of α-oxoamine synthase family is encoded by SPTLC1 gene which is a subunit of enzyme serine palmitoyltransferase (SPT). Mutations in SPTLC1 have been associated with hereditary sensory and auto...
متن کاملBioinformatic and empirical analysis of a gene encoding serine/threonine protein kinase regulated in response to chemical and biological fertilizers in two maize (Zea mays L.) cultivars
Molecular structure of a gene, ZmSTPK1, encoding a serine/threonine protein kinase in maize was analyzed by bioinformatic tool and its expression pattern was studied under chemical biological fertilizers. Bioinformatic analysis cleared that ZmSTPK1 is located on chromosome 10, from position 141015332 to 141017582. The full genomic sequence of the gene is 2251 bp in length and includes 2 exons. ...
متن کاملPhylogenetic analysis and genetic variation of Tomato yellow leaf curl virus based on the V1 gene in Iraq
Tomato yellow leaf curl virus (TYLCV) is a supreme pathogen in tropical and subtropical areas. During 2014-2015, a total of 393 tomato samples showing Tomato yellow leaf curl disease (TYLCD) symptoms were collected from six different provinces of Iraq. In serological assays, 55 out of 393 samples (14%) reacted positively with TYLCV-specific antibodies .The presence of TYLCV was verified in 21 (...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 19 Suppl 1 شماره
صفحات -
تاریخ انتشار 2003